{% extends 'base.html' %} {% block head %} {% endblock%} {% block content %}
As a business owner, I want to know how much pedestrian foot traffic occurs around me during the day and night, so that I can evaluate the suitability of the location and hours for my business.
Busy foot traffic in a business area may not always mean busy foot traffic at night.
As a business owner, I want to know how much pedestrian foot traffic I get compared to surrounding areas, so that I can assess if it is better to adapt my hours, extend or move locations.
Foot traffic flow during day or night may indicate adapting staff levels during specific hours.
Significantly low foot traffic in comparison to other streets may mean adapting business strategy or moving location.
Duration of steady foot traffic from early morning to mid afternoon only, may indicate adapting business hours to match.
At the end of this use case you will understand how to:
The datasets we examine for day and night traffic include the datsets below
Pedestrian Counting System - Monthly (counts per hour)
Pedestrian Counting System - Sensor Locations
The monthly hourly counts dataset contains traffic collected from sensors since 2009, it is updated monthly. It contains 10 fields including sensor id, date and time, year, month, day of year, day of week, time of day, sensor name and hourly counts.
The sensor locations dataset contains the details about the sensors detecting the pedestrian traffic. It contains 11 fields, the main ones of interest are sensor description, latitude, longitude, location and direction of reading.
#Libraries to be installed
##!pip -q is to give less output
!pip -q install sodapy
!pip -q install seaborn
!pip -q install pandas
!pip -q install matplotlib
!pip -q install numpy
!pip -q install nbconvert
!pip -q install keyboard
!pip -q install geopandas
!pip -q install sklearn
!pip -q install folium
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
WARNING: Ignoring invalid distribution -pencv-python-headless (c:\users\hannah\appdata\roaming\python\python310\site-packages)
#load libraries
import os
import time
import keyboard
from datetime import datetime
import numpy as np
import pandas as pd
from sodapy import Socrata
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
import matplotlib.colors as colors
from matplotlib import style
style.use('ggplot')
import plotly.graph_objs as go
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
import folium
from folium.plugins import MarkerCluster
#Socrata client connection
client = Socrata('data.melbourne.vic.gov.au', '9UAGAvkem9fqXDhBLElapjazL', None)
#Pedestrian sensor location data
sensor_data_id = "h57g-5234"
results = client.get(sensor_data_id)
sensor_location = pd.DataFrame.from_records(results)
#sensor_location.head(5)
sensor_location[['latitude', 'longitude']] = sensor_location[['latitude', 'longitude']].astype(float)
sensor_location = sensor_location.drop('location',axis=1)
sensor_location['lat'] = sensor_location['latitude'].apply(lambda x: float(x))
sensor_location['lon'] = sensor_location['longitude'].apply(lambda x: float(x))
#View sensor data
sensor_location.info()
sensor_location.head(5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sensor_id 91 non-null object
1 sensor_description 91 non-null object
2 sensor_name 91 non-null object
3 installation_date 91 non-null object
4 status 91 non-null object
5 note 10 non-null object
6 latitude 91 non-null float64
7 longitude 91 non-null float64
8 direction_1 78 non-null object
9 direction_2 78 non-null object
10 lat 91 non-null float64
11 lon 91 non-null float64
dtypes: float64(4), object(8)
memory usage: 8.7+ KB
| sensor_id | sensor_description | sensor_name | installation_date | status | note | latitude | longitude | direction_1 | direction_2 | lat | lon | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16 | Australia on Collins | Col270_T | 2009-03-30T00:00:00.000 | R | Device moved to location ID 53 (22/09/2015) | -37.815734 | 144.965210 | NaN | NaN | -37.815734 | 144.965210 |
| 1 | 50 | Faraday St-Lygon St (West) | Lyg309_T | 2017-11-30T00:00:00.000 | A | NaN | -37.798082 | 144.967210 | South | North | -37.798082 | 144.967210 |
| 2 | 73 | Bourke St - Spencer St (South) | Bou655_T | 2020-10-02T00:00:00.000 | I | NaN | -37.816957 | 144.954154 | East | West | -37.816957 | 144.954154 |
| 3 | 66 | State Library - New | QVN_T | 2020-04-06T00:00:00.000 | A | NaN | -37.810578 | 144.964443 | South | North | -37.810578 | 144.964443 |
| 4 | 59 | Building 80 RMIT | RMIT_T | 2019-02-13T00:00:00.000 | A | NaN | -37.808256 | 144.963049 | South | North | -37.808256 | 144.963049 |
#Pedestrian foot count data
#Uncomment the below to open data source, download sensor data, and store it as a csv locally.
# sensor_data_id = "b2ak-trbp"
# results = client.get(sensor_data_id, limit=7000000)
# ds_traffic = pd.DataFrame.from_records(results)
# ds_traffic.to_csv('sensor_traffic.csv', index=False)
sensor_traffic = pd.read_csv('sensor_traffic.csv')
sensor_traffic.info()
print(f'\nThe shape of dataset is {sensor_traffic.shape}. \n')
print('Below are the first few rows of this dataset: ')
sensor_traffic.head(10)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4562230 entries, 0 to 4562229
Data columns (total 10 columns):
# Column Dtype
--- ------ -----
0 id int64
1 date_time object
2 year int64
3 month object
4 mdate int64
5 day object
6 time int64
7 sensor_id int64
8 sensor_name object
9 hourly_counts int64
dtypes: int64(6), object(4)
memory usage: 348.1+ MB
The shape of dataset is (4562230, 10).
Below are the first few rows of this dataset:
| id | date_time | year | month | mdate | day | time | sensor_id | sensor_name | hourly_counts | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2887628 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 34 | Flinders St-Spark La | 300 |
| 1 | 2887629 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 39 | Alfred Place | 604 |
| 2 | 2887630 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 37 | Lygon St (East) | 216 |
| 3 | 2887631 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 40 | Lonsdale St-Spring St (West) | 627 |
| 4 | 2887632 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 36 | Queen St (West) | 774 |
| 5 | 2887633 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 29 | St Kilda Rd-Alexandra Gardens | 644 |
| 6 | 2887634 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 42 | Grattan St-Swanston St (West) | 453 |
| 7 | 2887635 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 43 | Monash Rd-Swanston St (West) | 387 |
| 8 | 2887636 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 44 | Tin Alley-Swanston St (West) | 27 |
| 9 | 2887637 | 2019-11-01T17:00:00.000 | 2019 | November | 1 | Friday | 17 | 35 | Southbank | 2691 |
#Add date column
sensor_traffic['date'] = pd.to_datetime(sensor_traffic.date_time).dt.date
#Add day of week column
sensor_traffic['dow'] = pd.to_datetime(sensor_traffic.date_time).dt.day_of_week
#convert sensor_id to integer
sensor_traffic['sensor_id']=sensor_traffic['sensor_id'].astype(int)
sensor_location['sensor_id']=sensor_location['sensor_id'].astype(int)
# Mesh pedestrian sensor location and foot traffic datasets
sensor_ds = pd.merge(sensor_traffic, sensor_location, on='sensor_id')
#Simply using the year to differentiate all the years prior to 2020 as before Covid, and post 2019 Covid traffic to now
sensor_ds['pre2020_hourly_counts'] = np.where(sensor_ds['year']<2020,sensor_ds['hourly_counts'] , 0)
sensor_ds['post2019_hourly_counts'] = np.where(sensor_ds['year']>2019,sensor_ds['hourly_counts'] , 0)
#Add column for day or night traffic
#Add column for day (5am to 5pm) or night (6pm to 4am) traffic
sensor_ds['day_counts'] = np.where(((sensor_ds['time']>4) & (sensor_ds['time']<18)),sensor_ds['hourly_counts'] , 0)
sensor_ds['night_counts'] = np.where(sensor_ds['day_counts']==0,sensor_ds['hourly_counts'], 0)
sensor_ds.describe()
| id | year | mdate | time | sensor_id | hourly_counts | dow | latitude | longitude | lat | lon | pre2020_hourly_counts | post2019_hourly_counts | day_counts | night_counts | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 | 4.562230e+06 |
| mean | 2.281708e+06 | 2.017333e+03 | 1.574675e+01 | 1.146982e+01 | 2.735892e+01 | 4.926315e+02 | 3.001057e+00 | -3.781341e+01 | 1.449621e+02 | -3.781341e+01 | 1.449621e+02 | 3.986104e+02 | 9.402113e+01 | 3.575141e+02 | 1.351174e+02 |
| std | 1.317918e+06 | 3.580411e+00 | 8.799595e+00 | 6.938244e+00 | 1.978492e+01 | 7.475427e+02 | 2.000297e+00 | 6.212739e-03 | 8.555391e-03 | 6.212739e-03 | 8.555391e-03 | 7.386555e+02 | 2.969237e+02 | 7.097362e+02 | 3.894963e+02 |
| min | 1.000000e+00 | 2.009000e+03 | 1.000000e+00 | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 0.000000e+00 | -3.782402e+01 | 1.449297e+02 | -3.782402e+01 | 1.449297e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 25% | 1.140558e+06 | 2.015000e+03 | 8.000000e+00 | 5.000000e+00 | 1.100000e+01 | 4.300000e+01 | 1.000000e+00 | -3.781863e+01 | 1.449587e+02 | -3.781863e+01 | 1.449587e+02 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 50% | 2.281116e+06 | 2.018000e+03 | 1.600000e+01 | 1.100000e+01 | 2.400000e+01 | 1.770000e+02 | 3.000000e+00 | -3.781381e+01 | 1.449651e+02 | -3.781381e+01 | 1.449651e+02 | 5.400000e+01 | 0.000000e+00 | 2.400000e+01 | 0.000000e+00 |
| 75% | 3.421673e+06 | 2.020000e+03 | 2.300000e+01 | 1.700000e+01 | 4.200000e+01 | 6.010000e+02 | 5.000000e+00 | -3.781102e+01 | 1.449669e+02 | -3.781102e+01 | 1.449669e+02 | 4.350000e+02 | 3.000000e+01 | 3.500000e+02 | 5.900000e+01 |
| max | 4.567701e+06 | 2.022000e+03 | 3.100000e+01 | 2.300000e+01 | 8.700000e+01 | 1.597900e+04 | 6.000000e+00 | -3.779432e+01 | 1.449747e+02 | -3.779432e+01 | 1.449747e+02 | 1.597900e+04 | 1.443700e+04 | 1.161200e+04 | 1.597900e+04 |
flag_value=0
df_day=sensor_ds.query("day_counts > @flag_value")
print('Day info\n', df_day.info(),'\n')
df_night=sensor_ds.query("day_counts == @flag_value")
print('Night info \n',df_night.info(),'\n')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2455152 entries, 0 to 4562223
Data columns (total 27 columns):
# Column Dtype
--- ------ -----
0 id int64
1 date_time object
2 year int64
3 month object
4 mdate int64
5 day object
6 time int64
7 sensor_id int32
8 sensor_name_x object
9 hourly_counts int64
10 date object
11 dow int64
12 sensor_description object
13 sensor_name_y object
14 installation_date object
15 status object
16 note object
17 latitude float64
18 longitude float64
19 direction_1 object
20 direction_2 object
21 lat float64
22 lon float64
23 pre2020_hourly_counts int64
24 post2019_hourly_counts int64
25 day_counts int64
26 night_counts int64
dtypes: float64(4), int32(1), int64(10), object(12)
memory usage: 515.1+ MB
Day info
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2107078 entries, 1 to 4562229
Data columns (total 27 columns):
# Column Dtype
--- ------ -----
0 id int64
1 date_time object
2 year int64
3 month object
4 mdate int64
5 day object
6 time int64
7 sensor_id int32
8 sensor_name_x object
9 hourly_counts int64
10 date object
11 dow int64
12 sensor_description object
13 sensor_name_y object
14 installation_date object
15 status object
16 note object
17 latitude float64
18 longitude float64
19 direction_1 object
20 direction_2 object
21 lat float64
22 lon float64
23 pre2020_hourly_counts int64
24 post2019_hourly_counts int64
25 day_counts int64
26 night_counts int64
dtypes: float64(4), int32(1), int64(10), object(12)
memory usage: 442.1+ MB
Night info
None
Separate day and night, and before and after Covid for mapping
#split dataset to map difference between time before covid-19 and time after covid-19
df = df_day
print(df_day.head(2))
df_daypft_beforecovid = df.loc[df['year']<2020]
df_daypft_aftercovid = df.loc[df['year']>2019]
#get average hourly count for each sensor during the selected period of time
df_daypft_beforecovid_avg = df_daypft_beforecovid[['sensor_id','sensor_description','lat','lon','hourly_counts']]
df_daypft_beforecovid_avg = df_daypft_beforecovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'hourly_counts': 'mean'})
df_daypft_aftercovid_avg = df_daypft_aftercovid[['sensor_id','sensor_description','lat','lon','hourly_counts']]
df_daypft_aftercovid_avg = df_daypft_aftercovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'hourly_counts': 'mean'})
id date_time year month mdate day time \
0 2887628 2019-11-01T17:00:00.000 2019 November 1 Friday 17
12 2888288 2019-11-02T05:00:00.000 2019 November 2 Saturday 5
sensor_id sensor_name_x hourly_counts ... latitude longitude \
0 34 Flinders St-Spark La 300 ... -37.81538 144.974151
12 34 Flinders St-Spark La 7 ... -37.81538 144.974151
direction_1 direction_2 lat lon pre2020_hourly_counts \
0 East West -37.81538 144.974151 300
12 East West -37.81538 144.974151 7
post2019_hourly_counts day_counts night_counts
0 0 300 0
12 0 7 0
[2 rows x 27 columns]
#split dataset to see difference between time before covid-19 and time after covid-19
df = df_night
print(df.head(2))
df_night_beforecovid = df.loc[df['year']<2020]
df_night_aftercovid = df.loc[df['year']>2019]
#get average hourly count for each sensor during the selected period of time
df_night_beforecovid_avg = df_night_beforecovid[['sensor_id','sensor_description','lat','lon','pre2020_hourly_counts']]
df_night_beforecovid_avg = df_night_beforecovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'pre2020_hourly_counts': 'mean'})
df_night_aftercovid_avg = df_night_aftercovid[['sensor_id','sensor_description','lat','lon','post2019_hourly_counts']]
df_night_aftercovid_avg = df_night_aftercovid_avg.groupby(['sensor_id','sensor_description','lat','lon',],as_index=False).agg({'post2019_hourly_counts': 'mean'})
id date_time year month mdate day time \
1 2887683 2019-11-01T18:00:00.000 2019 November 1 Friday 18
2 2887738 2019-11-01T19:00:00.000 2019 November 1 Friday 19
sensor_id sensor_name_x hourly_counts ... latitude longitude \
1 34 Flinders St-Spark La 240 ... -37.81538 144.974151
2 34 Flinders St-Spark La 158 ... -37.81538 144.974151
direction_1 direction_2 lat lon pre2020_hourly_counts \
1 East West -37.81538 144.974151 240
2 East West -37.81538 144.974151 158
post2019_hourly_counts day_counts night_counts
1 0 0 240
2 0 0 158
[2 rows x 27 columns]
Pedestrian traffic has decreased after 2019, the aim is to try to understand patterns of day and night traffic.
#examine pre Covid and post 2019 foot traffic
ds = pd.DataFrame(sensor_ds.groupby(["time"])["pre2020_hourly_counts","post2019_hourly_counts"].mean())
df = ds.sort_values(by=['time'])
axs = df.plot.line(figsize=(20, 6), color=["#0f9295","orange"])
axs.set_title('Foot Traffic by Time', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
#distribution by traffic, by day
pivot = pd.DataFrame(pd.pivot_table(sensor_ds, values=['day_counts','night_counts'], index=['year'], aggfunc=np.mean))
rs = pivot.sort_values(by='year', ascending = False)
axs = rs.plot.line(figsize=(12, 5), color=['orange','#056b8a'], legend=True);
axs.set_title('Foot traffic by year by day and night', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
#examine pre Covid and post 2019 foot traffic
ds = pd.DataFrame(sensor_ds.groupby(["dow","day"])["pre2020_hourly_counts","post2019_hourly_counts"].mean())
df = ds.sort_values(by=['dow'])
axs = df.plot.bar(figsize=(12, 4), color=["#0f9295","orange"])
axs.set_title('Foot Traffic by Day of Week - All Hours', size=20)
axs.set_ylabel('Average hourly counts', size=14)
plt.show()
The top 20 locations by foot traffic overall are displayed below.
#distribution by traffic, by day
pivot = pd.pivot_table(sensor_ds, values='day_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['day_counts'].nlargest(n=20)
pivot_ds.plot.bar(figsize=(12, 5), color='orange', legend=True);
#by night
pivot = pd.pivot_table(sensor_ds, values='night_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['night_counts'].nlargest(n=20)
axs = pivot_ds.plot.bar(figsize=(12, 5), color='#056b8a', legend=True);
axs.set_title('Top 20: Foot Traffic by location by day and night', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
We will investigate if the traffic volumes by location are different depending on day and night traffic.
#distribution by traffic, by day
pivot = pd.pivot_table(df_day, values='pre2020_hourly_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['pre2020_hourly_counts'].nlargest(n=20)
pivot_ds.plot.line(figsize=(12, 5), color='#0f9295', legend=True, rot=90);
#by night
pivot = pd.pivot_table(df_day, values='post2019_hourly_counts', index=['sensor_id','sensor_description'], aggfunc=np.mean)
pivot_ds = pivot['post2019_hourly_counts'].nlargest(n=20)
axs = pivot_ds.plot.bar(figsize=(12, 5), color='orange', legend=True, rot=90);
axs.set_title('Top 20: Foot Traffic by location by day', size=20)
axs.set_ylabel('Average counts', size=14)
plt.show()
The chart above shows how signicantly traffic has changed after Covid from 2020 onwards for the top 20 locations.
We can see changes in the traffic of each sensor before Covid and after Covid on the map below. Please right click on the icon in the top right corner of the map and select the layer to see traffic before Covid.
#Visualise day data
m = folium.Map(location=[-37.8167, 144.967], zoom_start=15)# tiles='Stamen Toner'
locations = []
for i in range(len(df_daypft_beforecovid_avg)):
row =df_daypft_beforecovid_avg.iloc[i]
location = [(row.lat,row.lon)]*int(row['hourly_counts'])
locations += location
marker_cluster = MarkerCluster(
name='day traffic before covid-19',
locations=locations,
overlay=True,
control=True,
color='red',
show=False # this removes from automatic selection in display - need to select to show data points
)
marker_cluster.add_to(m)
#next layer
locations = []
for i in range(len(df_daypft_aftercovid_avg)):
row =df_daypft_aftercovid_avg.iloc[i]
location = [(row.lat,row.lon)]*int(row['hourly_counts'])
locations += location
marker_cluster = MarkerCluster(
name='day traffic after covid-19',
locations=locations,
overlay=True,
control=True,
)
marker_cluster.add_to(m)
folium.LayerControl().add_to(m)
m